Building a Bilingual Vietnamese-French Named Entity Annotated Corpus through Cross-Linguistic Projection

نویسندگان

  • Ngoc Tan Le
  • Fatiha Sadat
چکیده

Résumé. La création de ressources linguistiques de bonne qualité annotées en entités nommées est très coûteuse en temps et en main d’œuvre. La plupart des corpus standards sont disponibles pour l’anglais mais pas pour les langues peu dotées, comme le vietnamien. Pour les langues asiatiques, cette tâche reste très difficile. Le présent article concerne la création automatique de corpus annotés en entités nommées pour le vietnamien-français, une paire de langues peu dotée. L’application d’une méthode basée sur la projection cross-lingue en utilisant des corpus parallèles. Les évaluations ont montré une bonne performance (F-score de 94.90%) lors de la reconnaissance des paires d’entités nommées dans les corpus parallèles et ainsi la construction d’un corpus bilingue annoté en entités nommées.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EVBCorpus - A Multi-Layer English-Vietnamese Bilingual Corpus for Studying Tasks in Comparative Linguistics

Bilingual corpora play an important role as resources not only for machine translation research and development but also for studying tasks in comparative linguistics. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating machine translation models and comparative linguistics tasks. This paper presents research on building an English-Vi...

متن کامل

Sixth International Joint Conference on Natural Language Processing Proceedings of the 11th Workshop on Asian Language Resources

Bilingual corpora play an important role as resources not only for machine translation research and development but also for studying tasks in comparative linguistics. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating machine translation models and comparative linguistics tasks. This paper presents research on building an English-Vi...

متن کامل

POS-Tagger for English-Vietnamese Bilingual Corpus

Corpus-based Natural Language Processing (NLP) tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for unpopular languages (e.g. Vietnamese) are at a deadlock due to absence of annotated training data for these languages. Furthermore, hand-annotation of even reasonably well-determined features such ...

متن کامل

Building English-Vietnamese Named Entity Corpus with Aligned Bilingual News Articles

Named entity recognition aims to classify words in a document into pre-defined target entity classes. It is now considered to be fundamental for many natural language processing tasks such as information retrieval, machine translation, information extraction and question answering. This paper presents a workflow to build an English-Vietnamese named entity corpus from an aligned bilingual corpus...

متن کامل

Named Entity Recognition for Vietnamese

Named Entity Recognition is an important task but is still relatively new for Vietnamese. It is partly due to the lack of a large annotated corpus. In this paper, we present a systematic approach in building a named entity annotated corpus while at the same time building rules to recognize Vietnamese named entities. The resulting open source system achieves an F-measure of 83%, which is better ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015